Goto

Collaborating Authors

 internet data


Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data

Verghese, Mrinal, Atkeson, Christopher

arXiv.org Artificial Intelligence

This study explores the utility of various internet data sources to select among a set of template robot behaviors to perform skills. Learning contact-rich skills involving tool use from internet data sources has typically been challenging due to the lack of physical information such as contact existence, location, areas, and force in this data. Prior works have generally used internet data and foundation models trained on this data to generate low-level robot behavior. We hypothesize that these data and models may be better suited to selecting among a set of basic robot behaviors to perform these contact-rich skills. We explore three methods of template selection: querying large language models, comparing video of robot execution to retrieved human video using features from a pretrained video encoder common in prior work, and performing the same comparison using features from an optic flow encoder trained on internet data. Our results show that LLMs are surprisingly capable template selectors despite their lack of visual information, optical flow encoding significantly outperforms video encoders trained with an order of magnitude more data, and important synergies exist between various forms of internet data for template selection. By exploiting these synergies, we create a template selector using multiple forms of internet data that achieves a 79\% success rate on a set of 16 different cooking skills involving tool-use.


OpenAI, Microsoft face class-action suit over internet data use for AI models

FOX News

Sam Altman, the CEO of artificial intelligence lab OpenAI, told a Senate panel he welcomes federal regulation on the technology'to mitigate' its risks. A class-action complaint filed Wednesday in the northern district of California alleges tech leaders OpenAI and Microsoft Corp. used "stolen and misappropriated" information from hundreds of millions of internet users without their knowledge to train and develop its artificial intelligence tech like chatbot ChatGPT. The 16 plaintiffs, who are represented by the Clarkson Law Firm and listed with initials, claimed the defendants "continue to unlawfully collect and feed additional personal data from millions" worldwide to that end and that they systematically scraped 300 billion words from the internet without consent. "Once trained on stolen data, defendants saw the immediate profit potential and rushed the products to market without implementing proper safeguards or controls to ensure that they would not produce or support harmful or malicious content and conduct that could further violate the law, infringe rights and endanger lives," Clarkson continued. "Without these safeguards, the products have already demonstrated their ability to harm humans, in real ways."


A Linguistic Investigation of Machine Learning based Contradiction Detection Models: An Empirical Analysis and Future Perspectives

Pielka, Maren, Rode, Felix, Pucknat, Lisa, Deußer, Tobias, Sifa, Rafet

arXiv.org Artificial Intelligence

We analyze two Natural Language Inference data sets with respect to their linguistic features. The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model. To this end, we also investigate the differences between a crowd-sourced, machine-translated data set (SNLI) and a collection of text pairs from internet sources. Our main findings are, that the model has difficulty recognizing the semantic importance of prepositions and verbs, emphasizing the importance of linguistically aware pre-training tasks. Furthermore, it often does not comprehend antonyms and homonyms, especially if those are depending on the context. Incomplete sentences are another problem, as well as longer paragraphs and rare words or phrases. The study shows that automated language understanding requires a more informed approach, utilizing as much external knowledge as possible throughout the training process.


Robots turn racist and sexist with flawed AI, study finds: Neural networks built from biased Internet data teach robots to enact toxic stereotypes

#artificialintelligence

The work, led by Johns Hopkins University, Georgia Institute of Technology, and University of Washington researchers, is believed to be the first to show that robots loaded with an accepted and widely-used model operate with significant gender and racial biases. The work is set to be presented and published this week at the 2022 Conference on Fairness, Accountability, and Transparency (ACM FAccT). "The robot has learned toxic stereotypes through these flawed neural network models," said author Andrew Hundt, a postdoctoral fellow at Georgia Tech who co-conducted the work as a PhD student working in Johns Hopkins' Computational Interaction and Robotics Laboratory. "We're at risk of creating a generation of racist and sexist robots but people and organizations have decided it's OK to create these products without addressing the issues." Those building artificial intelligence models to recognize humans and objects often turn to vast datasets available for free on the Internet.


Clearview AI will get a US patent for its facial recognition tech

Engadget

Clearview AI is about to get formal acknowledgment for its controversial facial recognition technology. Politico reports Clearview has received a US Patent and Trademark Office "notice of allowance" indicating officials will approve a filing for its system, which scans faces across public internet data to find people from government lists and security camera footage. The company just has to pay administrative fees to secure the patent. In a Politico interview, Clearview founder Hoan Ton-That claimed this was the first facial recognition patent involving "large-scale internet data." The firm sells its tool to government clients (including law enforcement) hoping to accelerate searches.


Privacy, altruism, and experience: Estimating the perceived value of Internet data for medical uses

Gefen, Gilie, Ben-Porat, Omer, Tennenholtz, Moshe, Yom-Tov, Elad

arXiv.org Artificial Intelligence

People increasingly turn to the Internet when they have a medical condition. The data they create during this process is a valuable source for medical research and for future health services. However, utilizing these data could come at a cost to user privacy. Thus, it is important to balance the perceived value that users assign to these data with the value of the services derived from them. Here we describe experiments where methods from Mechanism Design were used to elicit a truthful valuation from users for their Internet data and for services to screen people for medical conditions. In these experiments, 880 people from around the world were asked to participate in an auction to provide their data for uses differing in their contribution to the participant, to society, and in the disease they addressed. Some users were offered monetary compensation for their participation, while others were asked to pay to participate. Our findings show that 99\% of people were willing to contribute their data in exchange for monetary compensation and an analysis of their data, while 53\% were willing to pay to have their data analyzed. The average perceived value users assigned to their data was estimated at US\$49. Their value to screen them for a specific cancer was US\$22 while the value of this service offered to the general public was US\$22. Participants requested higher compensation when notified that their data would be used to analyze a more severe condition. They were willing to pay more to have their data analyzed when the condition was more severe, when they had higher education or if they had recently experienced a serious medical condition.


RiskIQ: The Digital Threat Hunter Using AI To Define The Future Of Cyber Security

#artificialintelligence

Cyber Security is projected to become a $232 billion global market by 2022. Cyber Security is a rapidly evolving industry, projected to become a $232 billion global market by 2022. This estimated valuation reflects a significant rise from last year, in which the market value reached $137.8 billion worldwide in 2017. The emergence of mobile platforms and cloud-based enterprise apps, coupled with the increased adoption of advanced technologies such as fingerprint identification and biometrics have collectively fueled a notable spike in the space. Although cyber security is attracting greater attention across the globe, the United States stands as the dominant force leading the charge for innovation.


5 Ways Artificial Intelligence Will Impact Smartphones In 2018

#artificialintelligence

Artificial Intelligence(AI) is the next big move in Technology. With all the areas getting impacted, it is also expected to bring about changes in the smartphone industry as well. In this article, we have compiled a list of 5 ways in which Artificial Intelligence will impact Smartphones in 2018. The market has ample number of apps that translate text from one language to another. All these apps make use of Internet data to upload the text to be translated and then translate it.